Commentary on "The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators"
نویسنده
چکیده
In the field of prediction with expert advice, a standard goal is to sequentially predict data as well as the best expert in some reference set of ‘expert predictors’. Universal data compression, a subfield of information theory, can be thought of as a special case. Here, the set of expert predictors is a statistical model, i.e. a family of probability distributions, and the predictions are scored using the logarithmic loss function, which, via the Kraft inequality, gives the procedure an interpretation in terms of data compression. A prediction strategy is a function that, for each n, given data xn ≡ x1, . . . , xn, outputs a “predictive” probability distribution p(· | xn) for Xi+1. For a given modelM, the Shtarkov or Normalized Maximum Likelihood (NML) strategy relative to M, is the prediction strategy that achieves the minimax optimal individual-sequence regret relative to M. NML has a number of drawbacks, detailed below, and is therefore often approximated by more convenient strategies such as Sequential Normalized Maximum Likelihood (SNML) or the Bayesian strategy. The latter predicts using the Bayesian predictive distribution for the model M, defined relative to some prior π, which is often taken to be Jeffreys’ prior — in that case we abbreviate it to J.B. The text below has been written so as to be (hopefully) understandable for readers who do not know too many details of these concepts; for such details, see e.g. Grünwald (2007) and/or Kotlowski and Grünwald (2011) (KG from now on).
منابع مشابه
Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data
Kernel density estimators are the basic tools for density estimation in non-parametric statistics. The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in which the bandwidth is varied depending on the location of the sample points. In this paper, we initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...
متن کاملAsymptotic Efficiencies of the MLE Based on Bivariate Record Values from Bivariate Normal Distribution
Abstract. Maximum likelihood (ML) estimation based on bivariate record data is considered as the general inference problem. Assume that the process of observing k records is repeated m times, independently. The asymptotic properties including consistency and asymptotic normality of the Maximum Likelihood (ML) estimates of parameters of the underlying distribution is then established, when m is ...
متن کاملOn the Minimax Optimality of Block Thresholded Wavelets Estimators for ?-Mixing Process
We propose a wavelet based regression function estimator for the estimation of the regression function for a sequence of ?-missing random variables with a common one-dimensional probability density function. Some asymptotic properties of the proposed estimator based on block thresholding are investigated. It is found that the estimators achieve optimal minimax convergence rates over large class...
متن کاملEstimation of Parameters for an Extended Generalized Half Logistic Distribution Based on Complete and Censored Data
This paper considers an Extended Generalized Half Logistic distribution. We derive some properties of this distribution and then we discuss estimation of the distribution parameters by the methods of moments, maximum likelihood and the new method of minimum spacing distance estimator based on complete data. Also, maximum likelihood equations for estimating the parameters based on Type-I and Typ...
متن کاملPositive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications
Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...
متن کامل